Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A start at opening the door to support for other languages. #103

Closed
wants to merge 1 commit into from

Conversation

paul90
Copy link
Member

@paul90 paul90 commented Mar 7, 2015

Just a start for review

a link to Åsnes generates a request URL: http://localhost:3000/snes.json?random=14cc3a4d&title=%C3%85snes

title provides something that a server might want to work with to find the page's data.

@WardCunningham
Copy link
Member

It would seem that we no longer need the link attribute, data-page-name="#{slug}", applied in lib/resolve.coffee. I'm not sure how to test that this is true.

@paul90
Copy link
Member Author

paul90 commented Mar 8, 2015

Maybe a different tack...

When we are fetching a page we already have a title to slug lookup table in the sitemap. So, the only real issues are with page creation, and ensuring that the sitemap is up to date. As we only create pages either on the origin server, or in local storage, there is little reason why this change could not be a breaking one - and introduce a level of cooperation between the client and server when a page is initially created, or forked.

@WardCunningham
Copy link
Member

Good point. When a sitemap is present and the link text matches a sitemap entry then this mapping should override that of asStub. We should define text matching to be a case independent match when we know enough about the character set to apply this transformation.

We now have at least two ways to "paint our way" out of this corner. I don't see any problem with applying both when possible.

@WardCunningham WardCunningham mentioned this pull request Mar 25, 2015
@egonelbre
Copy link

I dislike the necessity of having the sitemap. Currently one of my sitemaps is 1MB, which isn't huge, but certainly an inconvenience on mobile devices. E.g. what if you wanted to use wikipedia.org as a part of federated wiki.

Maybe we could let the servers handle resolving the names instead? For example when you request for "/世界" it would use an HTTP redirect you to "/world", also the normalized slug / url would always be present in the page itself, that way when a new link is created it would always refer to correct slug/title.

@WardCunningham
Copy link
Member

Maybe you could try the paul90/utf-8-pagename branch and tell us how it works for you?

I would be interested in how interoperable you could make your server. When provided a title you would have all the information that the client has. You would be responsible for case-insensitive matching for those alphabets that have case. You would also be expected to match conventional slugs when handling CORS requests from clients that don't have this modification.

I still think the sitemap approach has merit and would allow a server to number pages as some wiki implementations want to do. If you choose to not offer a sitemap then this indirection will not be available to you. I haven't thought through this case carefully yet. It seems that employing an alternate slug algorithm has similar interoperability considerations.

@egonelbre
Copy link

I took a look at the changes, and it doesn't seem to have the history copying - but of course it could be added. I simply cannot see whether that approach would look good on the titlebar. Also I'm using a different client.

The handling on the server side is would be pretty trivial - i.e.

  • check whether you have the exact page needed
  • check whether you have the page with your own slugification approach (i.e. normalize, remove spaces ... whatever you consider equivalent)
  • check whether any of the pages has the old-format-slug

@WardCunningham
Copy link
Member

How different is your client?

@egonelbre
Copy link

The current progress is here: https://github.com/raintreeinc/kbclient. Not complete and I haven't yet had time to completely fix some of the issues. I'm currently still migrating the first prototype to the new infrastructure. Essentially the client is a completely different codebase, but behavior is similar.

Also, it has different goals than fedwiki, so it doesn't have (exactly) the same feature-set. Of course I try to keep it inter-operable with fedwiki protocol. The gist of the KnowledgeBase is "federated wiki between different people groups". Essentially this is an effort to merge usual help pages with engineering information and user provided wiki information.

I liked the navigation style of Federated Wiki client, so I started from there. It provides a very fast way of navigation complex information. The Federation part gives really good boundaries for managing different wiki-s and their security, but at the same time allow one way access.

@WardCunningham
Copy link
Member

Another approach might be for the client to try a unicode-friendly slug first and if that doesn't work, try the original slug when different. Can you provide us a table of sample titles and how they would convert to the two slug formats in question?

@egonelbre
Copy link

These should give the general idea https://github.com/egonelbre/fedwiki/blob/master/slug_test.go#L9. kbclient contains the slugification code for javascript as well.

@egonelbre
Copy link

I'm worried with my suggested slugification as well... Mainly, maybe there is some letters that aren't present in the current unicode tables, which means that in future you need to start upgrading those tables for clients, otherwise it won't work for all cases. With Server based approach you don't have that problem, because the server knows what pages it has and can appropriately discard those pages.

Of course there is a problem with the Server approach - if you update your naming/slugification scheme links to your site will break.

@WardCunningham
Copy link
Member

@egonelbre I thought your slugs looked nice and handled some edge cases like duplicate or leading/trailing dashes that were meant for the original but didn't get implemented in the rush to get the project started.

I notice that you spell out some symbols. Is that to avoid specific meaning in a url while avoiding the %xx notation?

I also notice that you pass the slash (/) with some reference to hierarchical names. Are they a requirement of your application? Or is this something you like from other wiki? I have avoided namespace concepts hoping that federation would be sufficient and more "natural".

@egonelbre
Copy link

The reason I am replacing symbols is to avoid problems with URLs. Essentially I had titles such as * operator, @ operator, so simply removing them wasn't an option. I don't like replacing them with %xx also, because they won't look nice.

I initially was keeping the slash because I had multiple federation end-points under a single address. E.g. /help/, /wiki/, /dev/ of course now I'm changing it to properly distributed. The main use-case I can see for paths is generated services and converted pages:

  1. you are exporting a directory as a federated server
  2. you take an existing html page structure and convert to federated wiki

One case I have is that we have multiple application version and each one has separate version of help pages - so it would be nice to serve them from under help.raintreeinc.com/500, help.raintreeinc.com/400... instead of help500.raintreeinc.com (or something similar)... that approach also reduces the DNS maintanance overhead.

Essentially regarding /, I haven't yet decided either way - it is useful for specific cases, but when you are creating a personal wiki it isn't necessary.

@egonelbre
Copy link

Forgot one of my use cases... providing a citations listing, e.g. /citations/xyz is a page that links to all pages that cite or reference page /xyz. I can't do that on the client side because the whole site content is +100MB. Similarly /tags/xyz, for all pages that have a tag xyz.

@paul90
Copy link
Member Author

paul90 commented Mar 26, 2015

Somewhat off topic, regarding slugs and internationalization, but related

Elsewhere, WardCunningham/Smallest-Federated-Wiki#412, there are some initial thoughts on changing the story serialization. This is somewhat connected insofar as it might open a few possibilities. While I'm not sure that I would go quite the same route having thought about this for nearly a year I would probably use something more like http://fed.wiki.org/#welcome-visitors&[email protected]. Which is what I've used with the narrative view.

Not quite sure how services like citations and tags would mix with the current URL structure - while they could replace view on the origin server, wanting to request them from others servers is more tricky, as we then have either 2 or 3 parts for each page in the url. Add in hierarchy support and there could not only be more, but some ambiguity if we have a hierarchy or are calling a service to create the page.


I'm really not sure if I've seen a slugify routine that I am really happy with. They all seem to have some shortcomings, the best remove accents rather than dropping letters as well as replacing ligatures.

I wonder if it would not be better to simply URL encode the page title - most modern browsers will do their thing and make it readable. It is then up to the server how it maps this onto the name it will use for storage. The big downside is that the slug is used to refer to the page in many places, so this would be a big change.

@egonelbre
Copy link

The citations and tags wouldn't really be separate services they are simply regular pages just with a specific structure. Other clients don't have to know that there even are special pages - if they know that they exist, they can take advantage of them. Also a correction, the full path I'm using is /index/citations/xyz and /index/tags/xyz.

I'm currently using http://localhost/#┃/home┃/1-up-labels┃//fed.wiki.org/welcome-visitors for the lineup (copy paste to location-bar, it looks better in there). Removing accents doesn't really work for 世界. And yes, I have to agree with slugification on the client side, it's just hard to make sure it's future proof and later changing it would cause even more problems.

Although I also tried the @ approach, but it seemed that //server/page looks more readable. Although using an @ would enable to use [email protected]/search... Although specifying links to them would be more complicated... also it seems like reinventing URIs.

I'm not exactly sure what do you mean by that downside of "slug is used to refer to the page in many places". Could you elaborate?

@paul90
Copy link
Member Author

paul90 commented Mar 26, 2015

I'm not exactly sure what do you mean by that downside of "slug is used to refer to the page in many places". Could you elaborate?

Just musing that if we moved to using url encoding for making server requests, and doing the slug generation on the server, that there are knock-on effects in the client.

The more I look at this, the more I get the feeling that all the slugify routines are a relic from the past, before any internationalization of the web. The different national versions of wikipedia look to use a mix of url encoding and a minimal set of character replacement (replacing with_being the obvious one). So, we they have urls likehttps://el.wikipedia.org/wiki/%CE%91%CE%BD%CE%B4%CF%81%CE%AD%CE%B1%CF%82_%CE%9C%CE%B9%CE%B1%CE%BF%CF%8D%CE%BB%CE%B7%CF%82which as rendered in the address bar ashttps://el.wikipedia.org/wiki/Ανδρέας_Μιαούλης, and of course I could just type the Greek in the address bar. The server would have to do some work to translate the url encoding into a storage location.

@paul90
Copy link
Member Author

paul90 commented Jul 11, 2016

I'm closing this, as there has been no activity in over a year.

@paul90 paul90 closed this Jul 11, 2016
@replaid
Copy link

replaid commented Oct 20, 2022

I am creating a wiki farm for an international project related to permaculture. The ability to title pages with non-Latin characters is an absolute must. Absolutely everything else about Federated Wiki is squarely aligned with this project's needs.

Naively, just jumping into this thread 6 years after it was closed, I like the approach of imitating Wikipedia's URL-encoding-based solution on the front end and letting the back end worry about how to persist it. Punycode would be another possible encoding that might offer a path for compatibility.

I am motivated to fix this for my community and would really love to do so in a way that fixes it for the world and fits with this project.

@almereyda
Copy link

Hi @replaid !

Glad you are joining the conversation here. To ease the conversation and development process, though, I would suggest to open up a new issue, preferably in the overarching project https://github.com/fedwiki/wiki/issues (as GitHub discussions are disabled).

I am myself responsible for taking care of the legacy from Silke Helfrich and her fabulous work with David Bollier on the pattern language of commoning, published in Federated Wikis at:

There are a few more language editions coming up next (French, Spanish, Greek, and what have you)

We are entering uncharted land here, so it will be on us to provide the Wiki community not only with requirements, but also with good practices, possible workarounds and eventually original development. The subjects of translation, and more broadly, internationalisation (i18n) will bring much joy and nuts to crack, so we ought to continue to work slowly, but surely on this.

I went ahead and filed a new issue for this subject:

Please feel free to continue our conversation there.

@WardCunningham
Copy link
Member

WardCunningham commented Oct 22, 2022

@replaid can you give us some examples of "pages titled with non-Latin characters an absolute must." Are you expecting collaboration to work across languages? How? For the sake of this conversation, perhaps you can give us English translations too.

@replaid
Copy link

replaid commented Oct 22, 2022

The author below is compiling permaculture articles in Russian and we anticipate collaboration between speakers of Russian, English, and Spanish in the near term, possibly other languages as well (the next one on the list would be right-to-left…). This would be titled "Guilds" in English.

Here it is with an English-language translation too:

http://john.permakultura.wiki/view/welcome-visitors/miron.permakultura.wiki/gildii/view/guilds

The limitation of Latin letters for the page title is the only roadblock I am aware of to having a normal wiki workflow and interaction. But we anticipate a fairly broad-based selection, i.e. not all academics—I believe it will be pretty important to be able to support a fully native-language workflow.

@replaid
Copy link

replaid commented Oct 22, 2022

I would want to name this page in its own script and cite it the same way: [[Гильдии]]. Writers in other languages are used to flipping their keyboard back and forth to type web addresses, etc., so this won't represent much of a problem. It would be nice to be able to make links using keys that are available on the native language (Russian has more letters, so curly braces and square brackets aren't easily available), but this is a far lower priority than being able to use non-Latin letters in page titles.

Russian is a phonetic alphabet, and can be very effectively transliterated into Latin letters (in this case Gil'dii)—this happens in the slugs on Russian news sites—but I don't know if it's best to pursue that path across languages.

I have been looking at Punycode, Nameprep, and stringprep and I think it has potential—the main thing I'm aware of that I need to think through is how it would interact with wiki when searching for pages with the use of substrings. Otherwise I think it's very close to our needs with a similar history of starting with a Latin-only system and extending into Unicode with security concerns, etc.

@WardCunningham
Copy link
Member

Would an author want to identify Gil'dii as a synonym for Гильдии?

@paul90
Copy link
Member Author

paul90 commented Oct 23, 2022

Internationalisation is not just an issue for the client. To ensure that all aspect are covered it will be best to discuss this subject over in fedwiki/wiki#139 , rather than in this closed PR that was just a start at exploring one small corner of this issue.

@replaid
Copy link

replaid commented Oct 23, 2022

Would an author want to identify Gil'dii as a synonym for Гильдии?

In terms of its value to the user experience, this would be like asking an English speaker to please type Уики Уики Уеб somewhere as a synonym for Wiki Wiki Web because the author of the software was Russian.

I've created an issue specifically for page titles at fedwiki/wiki#140. I'm optimistic that this can be done in a compatible way without a huge lift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants